Snorkel: Beyond Hand-labeled Data
نویسنده
چکیده
This talk describes Snorkel, a software system whose goal is to make routine machine learning tasks dramatically easier. Snorkel focuses on a key bottleneck in the development of machine learning systems: the lack of large training datasets for a user’s task. In Snorkel, a user implicitly defines large training sets by writing simple programs that create labeled data, instead of tediously hand-labeling individual data items. In turn, this allows users to incorporate many sources of training data, some of low quality, to build highquality models. This talk will describe how Snorkel changes the way users program machine learning models. A key technical challenge in Snorkel is combining heuristic training data that may have uneven and unknown quality and an unknown correlation structure. This talk will explain the underlying theory, including methods to learn both the parameters and structure of generative models without labeled data. Additionally we’ll describe our recent experiences with hackathons, which suggest the Snorkel approach may allow a broader set of users to train machine learning models and do so more easily than previous approaches. Snorkel is being used by scientists in areas including genomics and drug repurposing, by a number of companies involved in various forms of search, and by law enforcement in the fight against human trafficking. Snorkel is open source on github. Technical blog posts and tutorials are available at Snorkel.Stanford.edu. Bio: Christopher (Chris) Ré is an associate professor in the Department of Computer Science at Stanford University in the InfoLab who is affiliated with the Statistical Machine Learning Group, Pervasive Parallelism Lab, and Stanford AI Lab. His work's goal is to enable users and developers to build applications that more deeply understand and exploit data. His contributions span database theory, database systems, and machine learning, and his work has won best paper at a premier venue in each area, respectively, at PODS 2012, SIGMOD 2014, and ICML 2016. In addition, work from his group has been incorporated into major scientific and humanitarian efforts, including the IceCube neutrino detector, PaleoDeepDive and MEMEX in the fight against human trafficking, and into commercial products from major web and enterprise companies. He cofounded a company, based on his research, that was acquired by Apple in
منابع مشابه
Snorkel: Rapid Training Data Creation with Weak Supervision
Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train stateof-the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs...
متن کاملSnorkel: A System for Lightweight Extraction
We describe a vision and an initial prototype system for extracting structured data from unstructured or dark input sources–such as text, embedded tables, images, and diagrams–called Snorkel, in which users write traditional extraction scripts which are automatically enhanced by machine learning techniques. The key technical idea is to view the user’s actions with standard tools as implicitly d...
متن کاملWave Characteristics in Breaststroke Technique with and Without Snorkel Use
The purpose of this paper was to examine the characteristics of waves generated when swimming with and without the use of Aquatrainer® snorkels. Eight male swimmers performed two maximal bouts of 25 m breaststroke, first without the use of a snorkel (normal condition) and then using a snorkel (snorkel condition). The body landmarks, centre of the mass velocity, stroke rate, stroke length, strok...
متن کاملThe “Oil-Spill Snorkel”: an innovative bioelectrochemical approach to accelerate hydrocarbons biodegradation in marine sediments
This study presents the proof-of-concept of the "Oil-Spill Snorkel": a novel bioelectrochemical approach to stimulate the oxidative biodegradation of petroleum hydrocarbons in sediments. The "Oil-Spill Snorkel" consists of a single conductive material (the snorkel) positioned suitably to create an electrochemical connection between the anoxic zone (the contaminated sediment) and the oxic zone (...
متن کاملEffect of nasopharyngeal snorkel on respiratory function in patients with stroke
Stroke causes significant mortality and morbidity. The clinical value of the nasopharyngeal snorkel was investigated in stroke patients with disorders of consciousness. A total of 155 stroke patients were randomly divided into two groups: a nasopharyngeal snorkel was used in the treatment group (n=78) and an oropharyngeal snorkel was used in the control group (n=77). The PaO2 and PCO2 of both g...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017